Skip to content

PCSM-167: Restore eventsRead during recovery#150

Merged
inelpandzic merged 3 commits intomainfrom
PCSM-167-fix-events-read
Nov 18, 2025
Merged

PCSM-167: Restore eventsRead during recovery#150
inelpandzic merged 3 commits intomainfrom
PCSM-167-fix-events-read

Conversation

@inelpandzic
Copy link
Collaborator

@inelpandzic inelpandzic commented Nov 16, 2025

PCSM-167 Powered by Pull Request Badge

During recovery eventsRead were not restored from the checkpoint and thus lead to a weird status after recovery, where we applied more events than we read:

❯ pcsm status
{
  "ok": true,
  "state": "running",
  "info": "Replicating Changes",
  "lagTimeSeconds": 2,
  "eventsRead": 322,     <======================
  "eventsApplied": 354, <======================
  "lastReplicatedOpTime": {
    "ts": "1763282460.1",
    "isoDate": "2025-11-16T08:41:00Z"
  },
  "initialSync": {
    "estimatedCloneSizeBytes": 12540,
    "clonedSizeBytes": 12540,
    "completed": true,
    "cloneCompleted": true
  }
}

That is because when we recover, we have eventsApplied correct and eventsRead set to zero.

Problem on top of this is also eventsRead value was also wrongly stored to the checkpoint, because the value which was stored was r.eventsRead.Load() which is not correct. Checkpointing is done periodically and generally the value of eventsRead versus eventsApplied will be larger in the PCSM status because some of the events are yet to be applied and PCSM can do a checkpoint with that difference.
But when we perform the recovery, we need to restore eventsRead to be the value of eventsApplied since all the difference, or all the events that we read that did not make to apply when we crashed, are gone and will be re-read when we recover from the last checkpoint.

@inelpandzic inelpandzic merged commit 5f47aa8 into main Nov 18, 2025
29 of 31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants